Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add KV cache quantization types #114

Merged
merged 2 commits into from
Jan 15, 2025

Conversation

ryan-the-crayon
Copy link
Collaborator

Adds KV cache quantization configuration

@ryan-the-crayon ryan-the-crayon requested a review from yagil November 6, 2024 16:59
@neilmehta24 neilmehta24 force-pushed the ryan/add-kv-cache-quantization-options branch from 3486867 to aa568e0 Compare January 15, 2025 20:02
@@ -0,0 +1,14 @@
import { z } from "zod";

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since we'll (probably?) have MLX types here too, let's drop Llama from the file name. This is the pattern we use in LLMLoadModelConfig.ts. Btw, shouldn't it just live in that file?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I deleted this file, but kept llama in the variable names since the MLX KV cache quantization implementation requires a different type.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In variable name 👍

LLMLlamaCacheQuantizationType,
llmLlamaCacheQuantizationTypes,
llmLlamaCacheQuantizationTypeSchema,
} from "./llm/LLMLlamaCacheQuantizationType";
Copy link
Member

@yagil yagil Jan 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
} from "./llm/LLMLlamaCacheQuantizationType";
} from "./llm/LLMLlamaCacheQuantizationType.js";

but probably should move to LLMLoadModelConfig anyway

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved these to LLMLoadModelConfig

@neilmehta24 neilmehta24 requested a review from yagil January 15, 2025 21:23
@neilmehta24 neilmehta24 force-pushed the ryan/add-kv-cache-quantization-options branch from f9009cf to 6f1e8f5 Compare January 15, 2025 21:27
@neilmehta24 neilmehta24 force-pushed the ryan/add-kv-cache-quantization-options branch from 6f1e8f5 to 13c4a57 Compare January 15, 2025 21:58
@neilmehta24 neilmehta24 merged commit d9f72c0 into main Jan 15, 2025
@neilmehta24 neilmehta24 deleted the ryan/add-kv-cache-quantization-options branch January 15, 2025 22:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants